SPARK tasking v2

In , I designed a decentralised tasking system requiring a membership service providing a list of all active checker nodes. Will suggested a simpler solution that does not require a list of active members. (Slack discussion.)

Assumptions

Checker nodes are (somewhat) uniformly distributed across all IPv4 /24 subnets.

Whenever we create a committee of IPv4 /24 subnets, there will be at least X% of checker nodes in those networks online; we can estimate the value of X, and it does not change too much.

Ideas

Use consistent hashing to split the space of all IPv4 /24 subnets into buckets - committees.

Extend the length of the SPARK round and let each committee perform multiple retrieval checks per round.

Assign rewards to checker nodes retroactively after nodes report retrieval results. After the round is over, we know exactly which checker nodes participated in the round - that gives us the membership information.

Within each subnet and for each job, assign the reward to the node inside that subnet that is “closest” to the job, using the XOR distance between the node’s public key and the hash of the job definition. (See below for details.)

Benefits

We can have a network of tasker nodes that don’t share any state.

Tasks are computed deterministically. The fraud detection algorithm can re-run the tasking algorithm to verify which tasks are legitimate.

The algorithm executed by the tasking node

Parameters

CC - “committee count” - how many committees we want to form. Initially, this will be a hardcoded parameter. We will manually tweak it over time based on the number of active checker nodes.

TC - “tasks per committee” - how many tasks to define for each committee.

TN - “tasks per node” - how many tasks a node performs. This value should be a fraction of TC, e.g. TN = TC/5

We need to find a good balance between TC & TN.

We want TC to be high to maximise the probability that every checker node receives at least one reward.

We want to keep TN relatively low to limit the bandwidth consumed by checker nodes and by SPs serving our retrievals.

However, we may end up with not enough redundant measurements with high TC and low TN, which means we won’t have confidence that the majority is honest.

All three parameters will evolve over time, so we must figure out what telemetry data we must collect to allow us to reason about the suitability of current CC/TC/TN values.

Inputs

SPARK round

The IPv4 address of the checker node that’s asking for tasks

Step1: DRAND randomness

Inputs

SPARK round number

Steps

Map the SPARK round to the DRAND round, using genesis time & round length of SPARK & DRAND.

Obtain the 256 bits of randomness via DRAND public API.

Outputs

DRAND randomness (256 bits, the SHA-256 hash of the signature)

Step2: Committee Selection

Calculate subnet’s position on the hash ring:

const subnet = // first three bytes of node's IPv4 address
const data = concat(randomness, subnet)
const ring_index = await crypto.subtle.digest("SHA-256", data)

Calculate the bucket size for each committee:

ALL_SUBNETS := 2^24;
bucket_size := ALL_SUBNETS / CC;

Compute the committee the node belongs to:

committee := ring_index / bucket_size; // using big number arithmetics

Assuming the number of committees is a power of two, i.e. CC = 2^N and CC < 256

committee := most_significant_byte_of(ring_index) / (256 / CC)

Outputs

Index of the committee this node belongs to.

Step3: Committee Tasking

We want to define TC different tasks (retrieval checks) for each committee. The particular implementation will depend on how we implement . The implementation details are not important here.

Outputs:

A list of TC tasks (pairs of CID, SP) for the given committee and round.

The algorithm executed by the checker node

Determine what’s the current SPARK round.
TBD: How can a checker node reliably determine what’s the current SPARK round?
- If it’s time-based, then we need a reliable source of current time - the internal clock on desktop computers may be wrong.
- Can the tasker nodes provide API for this?

Ask the tasker for the DRAND randomness, committee index and the list of tasks for the SPARK round . (We obtained the round number in step 1.)

Pick the first TN closest tasks from the list, using the following algorithm to calculate the distance to (cid, sp):

distance := XOR(
  digest("SHA-512", concat(randomness, cid, sp)),
  checker_public_key, // ed25519 public keys are SHA-512 hashes
)

For each task in the list, and while we are still within the time window of the current SPARK round, perform the retrieval and report the results.
Q: Do we want to enforce serial execution of retrieval checks, or is it fine if a checker fires all retrieval requests immediately? Serial execution should spread the load on SPs over a longer time period.

Outputs (measurement record) - per each task

SPARK round

Checker’s identity (public key)

Checker’s IPv4 address we can trust
- Initially, this will be recorded by the measurement service we are running.
- We need to find a trustless solution in the future. For example, tasker services can use their identity to sign (randomness, ipv4). There may be better options, like leveraging TCP or QUIC handshake messages - that is a research task for later.

other fields as described in

Verification of measurements by the evaluation service

After the SPARK round is over and all measurements are recorded, we can gather membership information from the reported data and decide which measurements will be further used for verifications of proofs and eligible for rewards.

Determine the SPARK round we are evaluating.

Build a list of all IPv4 subnets that participated in this round.

For each IPv4 subnet, compute the committee index it belongs to. (See )

For each committee index seen, obtain the list of TC tasks assigned to this committee
1. This depends on how we implement CID sampling.
1. For the initial version, the tasking service can provide an API to retrieve the list of all committees and their tasks for the given SPARK round. get_tasks(round) -> Map<CommiteeIndex, Array<Task>>

Build a lookup table allowing us to quickly check whether (subnet, cid, sp) is a valid task.

For each measurement, check if (subnet, cid, sp) is valid. Filter out records that don’t pass.

Group all measurements by their (subnet, cid, sp) triple.

For each group (subnet, cid, sp)
1. For each (cid, sp) measured, find the closest measurement using the same algorithm as described in
```
distance := XOR(
  digest("SHA-512", concat(randomness, cid, sp)),
  checker_public_key, // ed25519 public keys are SHA-512 hashes
) 
```
1. Flag the closest measurement as valid; filter out all other measurements for the same task.

Output

List of measurements with the following properties:

Each measurement is for a valid task

For each task, there is at most one measurement per IPv4 subnet.

Computational Complexity

TL;DR:

Worst case: O(M * log(M)), where M is the number of measurements submitted for the given epoch

Optimistic: O(N * log(N)), where N is the number of checker nodes participating in the given epoch

Details

We need to iterate through all measurements multiple times - O(M)

We compute the committee index for each IPv4 subnet. Assuming the number of subnets actively participating in the round is S, the complexity is O(S). However, S is limited to 2^24 (~16mil).

In step 5, we want to build a set of all valid (subnet, cid, sp) triples. There will be at most TC valid tasks for each subnet, which gives use TC*S triples. Assuming a data structure with logarithmic complexity for inserts and lookups, step 5 has the complexity O(TC*S*log(TC*S)) .

In step 6, we are filtering out invalid measurements using the lookup table from step 5. This has the complexity O(M*log(TC*S)) . While TC*S is capped, M is not.

In step 7, we want to group valid measurements into groups (subnet, cid, sp). A naive algorithm that sorts the measurements by comparing (subnet, cid, sp) to create an array where groups are laid out one next to the other requires O(M*log(M)).

In step 8, we want to find the closest measurement in each group.
- In each group of measurements, we can find the closest measurement in linear time
- Therefore, the entire step 8 requires linear time, too → O(M)

Where:

M is the number of measurements submitted for the given epoch

S is the number of IPv4 subnets participating in the given epoch

TC is our parameter (number of tasks per committee)

N is the number of checker nodes participating in the given epoch

Invariants:

M ≥ N ≥ S

Worst-case

Combined, we are getting the following worst-case complexity:
O(M + S + TC*S*log(TC*S) + M*log(TC*S) + M*log(M) + M))

Which can be simplified to

O(M*(log(TC*S)+log(M)) + S + TC*S*log(TC*S)) = 
O(M*log(TC*S) + M*log(M) + S + TC*S*log(TC*S)) =
O(log(TC*S)*(M + TC*S) + M*log(M))

Treating TC as a constant:

O(log(S)*(M+S) + M*log(M)) = 
O(M*log(S) + M*log(M) + S*log(S) = # M >= S
O(M*log(M))

Optimistic

In the optimistic case:

Each node submits TN measurements, therefore M = N * TN.

Nodes are equally distributed between subnets, therefore S=O(N).

The time complexity is

O(N * TC * (log(TC*N) + log(N * TN)) + N + TC*N*log(TC*N)) =
O(N * TC * (log(TC) + log(N) + log(N) + log (TN)) + N + TC*N*log(TC*N)) =
O(N * TC * (log(TC) + log(TN) + log(N)) + N + N * TC * (log(TC) + log(N)) =
O(N * TC * (log(TN) + log (TC) + log(N)) + N)

Assuming N >> TN and N >> TC and treating both TN and TC as constants:

O(N * log(N))

Space (RAM) Complexity

I am excluding the space to store the raw measurements.

TL;DR:

O(S + M), where S is the number of participating IPv4 subnets and M is the number of measurements.

Keeping all measurements in memory will quickly become infeasible. We need to research an algorithm for finding the closest measurement in each group that will trade time for space, preferably leveraging features offered by the database storing the measurements.

Details

In step 2 & 3, we build a list of all participating IPv4 subnets - that’s O(S)

In step 4, we obtain a list of tasks for each committee - that’s O(CC*TC) (a constant)

In step 5, we build a set of all valid (subnet, cid, sp) triples. There will be at most S*TC triples to store - O(S*TC)

In step 7, we group measurements and then run some more steps for each group. We can store all groups in a list of all measurements - O(M)

Where:

M is the number of measurements submitted for the given epoch

S is the number of IPv4 subnets participating in the given epoch

TC is our parameter (number of tasks per committee)

N is the number of checker nodes participating in the given epoch

Invariants:

M ≥ N ≥ S

All together:

O(S + CC*TC + S*TC + M) = # next, ignore constants
O(S + M)